A machine learning approach to identifying important features for achieving step thresholds in individuals with chronic stroke

Background While many factors are associated with stepping activity after stroke, there is significant variability across studies. One potential reason to explain this variability is that there are certain characteristics that are necessary to achieve greater stepping activity that differ from others that may need to be targeted to improve stepping activity. Objective Using two step thresholds (2500 steps/day, corresponding to home vs. community ambulation and 5500 steps/day, corresponding to achieving physical activity guidelines through walking), we applied 3 different algorithms to determine which predictors are most important to achieve these thresholds. Methods We analyzed data from 268 participants with stroke that included 25 demographic, performance-based and self-report variables. Step 1 of our analysis involved dimensionality reduction using lasso regularization. Step 2 applied drop column feature importance to compute the mean importance of each variable. We then assessed which predictors were important to all 3 mathematically unique algorithms. Results The number of relevant predictors was reduced from 25 to 7 for home vs. community and from 25 to 16 for aerobic thresholds. Drop column feature importance revealed that 6 Minute Walk Test and speed modulation were the only variables found to be important to all 3 algorithms (primary characteristics) for each respective threshold. Other variables related to readiness to change activity behavior and physical health, among others, were found to be important to one or two algorithms (ancillary characteristics). Conclusions Addressing physical capacity is necessary but not sufficient to achieve important step thresholds, as ancillary characteristics, such as readiness to change activity behavior and physical health may also need to be targeted. This delineation may explain heterogeneity across studies examining predictors of stepping activity in stroke.


Introduction
Stroke is a leading cause of disability world-wide and results in numerous sequelae, including reduced walking ability and aerobic deconditioning [1,2]. This is problematic because reduced walking ability and aerobic deconditioning are associated with deficits in physical function [3,4], depression [5,6], and reduced self-efficacy [7]. As a result, many individuals with stroke are inactive [8] and not meeting physical activity recommendations to maximize health benefits [9,10]. In parallel, individuals with stroke often report improving their walking ability as a primary goal for rehabilitation [11] and clinicians spend considerable time on interventions to improve their walking [12]. Thus, two areas of particular relevance for the rehabilitation community are determining predictors of daily stepping activity that may inform whether an individual with stroke will be able to walk in the community or if they will be primarily home bound [3,13,14] and whether they will meet aerobic activity guidelines through walking [10,15]. The latter is salient for clinicians as reaching physical activity recommendations may have implications for future health outcomes [16][17][18][19].
Previous work has suggested that~2500 daily steps distinguishes between home versus community ambulators in individuals with stroke [3] and that~5500 daily steps is a reasonable target for individuals with disabilities to meet physical activity guidelines [10]. However, there has been significant heterogeneity in predictors of daily stepping activity after stroke. A recent meta-analysis including 26 studies and over 30 predictors of stepping activity post stroke found inconsistencies in the relevance of certain predictors [20]. This finding, in conjunction with the limited efficacy of interventions targeting daily stepping activity post stroke [21], suggests a need to better understand which predictors are most important for improving daily stepping activity after stroke.
To this end, the large number of variables that may influence walking activity after stroke requires analytical techniques with the ability to handle large, heterogeneous datasets. Machine learning techniques have this capacity as well as other advantages, including requiring fewer assumptions about the distributions of the data, numerous options for non-parametric models and dimensionality reduction techniques, and most notably their strong predictive capabilities [22][23][24]. Recent work has utilized machine learning to predict recovery of upper limb functioning [23,24] and functional outcomes after stroke with high accuracy [25]. In particular, one approach to determining which predictors may be most relevant is to utilize multiple different machine learning algorithms and compare predictors across algorithms [24]. This approach can help gain a better understanding of a target population and increase predictive power, as features that are shown to be important across mathematically unique algorithms likely represent fundamental characteristics of that population and are therefore important in predicting the outcome. Said differently, comparing relevant predictors across mathematically unique algorithms may help differentiate predictors that are most important for predicting the outcome from predictors that might be important for predicting the outcome in some individuals.
Despite these clear advantages, principled application of ML algorithms has not been used to understand the most important characteristics of stroke survivors who achieve stepping thresholds for community mobility status [3] or physical activity recommendations [10]. Thus, we had two objectives for this study. First, we aimed to determine which predictors are most important by applying three mathematically unique algorithms to a large dataset to predict achievement of community ambulation status (2500 steps/day) [3] and/or meeting physical activity recommendations (5500 steps/day) [10] and compare predictors across algorithms. We defined a predictor as important if it improved the model performance of all three algorithms. Based on previous evidence demonstrating that measures of physical capacity are likely the strongest predictors of daily stepping activity in stroke, we hypothesized that measures of physical capacity, specifically gait endurance [3,26] (6 Minute Walk Test) and gait speed [14,27] (10-Meter Walk Test), would be important predictors to all three algorithms. Based on evidence suggesting that balance self-efficacy [28,29] (Activities Specific Balance Confidence Scale) and environmental factors [30][31][32][33] (Area Deprivation Index) are also important for daily stepping activity post stroke but perhaps less important compared to physical capacity, we hypothesized that these measures would be important to one or two algorithms but not all three. Our second objective was to assess the prediction accuracy of these three different machine learning algorithms for each threshold.

Participants
Data was obtained from the baseline timepoint of a randomized clinical trial comparing the efficacy of specific interventions for improving daily walking activity in individuals with stroke [34]. Table 1 lists the eligibility criteria for this study. A more detailed description of these criteria can be found in the methods paper of the clinical trial [34]. All participants signed informed consent approved by the Human Subjects Review Board at the University of Delaware prior to study participation (IRB #878153-50). This work has been conducted according to the principles expressed in the Declaration of Helsinki. Inability to answer at least one orientation question correctly (item 1b of the National Institutes of Health Stroke Scale) and inability to follow at least one, two-step command (item 1c of the National Institutes of Health Stroke Scale)

Measures
We attempted to be as inclusive as possible when selecting measures to be included in the statistical analysis. Unless a measure had >5% missing data or was irrelevant to the outcome (discussed below), it was included in the analysis. Table 2 provides a description of each measure.

Daily stepping activity
To measure daily stepping activity, participants were provided with a FitBit TM at the baseline visit of the clinical trial. The FitBit TM has acceptable accuracy in detecting stepping activity in individuals with stroke [45][46][47][48]. Participants wore the device on their non-paretic ankle and were instructed to wear it for 7 days to reliably estimate daily stepping activity [49] and continue with their usual activity. Average steps per day (ASPD) was calculated by summing the total number of steps taken over all valid recording days and dividing this sum by the number of valid recording days.

Statistical analysis
Data processing. Fig 1 displays a data pipeline that describes the data and analysis procedures. First, the data was exported from the electronic database, REDCap [50], and initially comprised 283 individuals and 32 clinical and demographic variables. ASPD was used to determine stepping thresholds (home vs. community threshold of 2500 steps/day and the aerobic threshold of 5500 steps/day). After removing variables with more than 5% missing entries and variables that were irrelevant to the outcome (e.g., "patient ID"), 25 of the remaining 32 variables were used in our analyses. Of the 283 participants, 15 were excluded for having missing data in one or more of the 25 variables selected, leaving a total of 268 participants included in our analyses. Two individuals inspected the data for accuracy prior to analysis.
All analyses were conducted using custom-written code in the Python programming language and compiled with Spyder4, using the standard machine learning library sklearn [51]. The same procedures were repeated for both the home vs. community and aerobic thresholds. Briefly, the preprocessed data set from 268 participants was imported into a data frame. Our design matrix was composed of all 25 variables except for ASPD, which was used to compute our binary outcome variable, step threshold category. All data in our design matrix was normalized for stability using a min-max scaler, which uses the minimum value and range of the distribution to shift and scale the distribution, respectively, translating to the interval [0,1] while preserving the shape of the original distribution.
For the home versus community threshold, participants were assigned a label of 1 for home ambulator (ASPD < 2500) or 0 for community ambulator (ASPD � 2500). This resulted in a distribution of 58 (21.64%) home ambulators and 210 (78.36%) community ambulators. For the aerobic threshold, those whose ASPD were below the threshold of 5500 daily steps were given a label of 1, and those who met or exceeded the minimum aerobic threshold were labeled with a 0. The distribution in this case was 185 (69.03%) below the aerobic threshold and 83 (30.97%) above the threshold.
Drop-column procedure for feature importance. To address the first objective of this study, a two-stage procedure was used. The first stage was dimensionality reduction using lasso regularization. The purpose of this stage was two-fold: (1) dimensionality reduction to reduce redundancy and noise in our set of variables, and (2) reduce collinearity among the features, thus avoiding potential bias or conflation in measures of feature importance for strongly correlated variables in the second stage. Lasso was specifically chosen as it supported these objectives and allowed us to pass a smaller subset of features that held unique information about the target to the second stage. As an aside, we also considered using the elastic net Participants were instructed to walk as far as possible around a rectangular path for 6 minutes. The 6MWT is a valid and reliable test of walking endurance in persons with stroke [35,36]. Evidence suggests that the 6MWT is a strong predictor of walking activity in individuals with stroke [3,20,26].
Self-selected walking speed (m/s, SSWS) Participants traverse a 10-meter pathway in which the middle 6 meters are timed. Participants are instructed to walk at a leisurely pace as if they were going to get a drink from the refrigerator. The 10-Meter Walk Test is a valid and reliable test of gait speed in stroke [36].
Fastest walking speed (m/s, FWS) Similar to the above, except participants are instructed to walk at the fastest speed they safely can without running [36].
Speed modulation (m/s) Speed modulation measures the ability to change walking speeds and was calculated as FWS-SSWS.
The MoCA was collected as a global measure of cognitive impairment and assesses various domains, such as executive functioning and attention. The MoCA has acceptable sensitivity and specificity in detecting cognitive impairment in people with stroke [37].
Charlson Comorbidity Index (age adjusted, CCI) The CCI is a comorbidity index that inquires about other health conditions, such as myocardial infarction, diabetes, and congestive heart failure. The CCI has been shown to predict functional outcomes in individuals with stroke [38].
Patient Health Questionnaire-9 (PHQ-9) The PHQ-9 is a screening tool for depressive symptoms. It is a 9-item selfadministered questionnaire that asks participants to reflect on how often they have been bothered by specific problems over the past two weeks. The PHQ-9 is a valid and reliable measure of depressive symptoms in stroke [39].

Activities Specific Balance Confidence Scale (ABC)
The ABC is a 16-tem questionnaire that measures balance self-efficacy. Participants rate how confident they are performing various tasks on a scale of 0 ("no confidence") to 100 ("complete confidence"). Ratings for each item are averaged to produce an overall score. The ABC is a valid and reliable measure in persons with stroke [40] and has been shown to be related to daily stepping activity in stroke [28,29].
Body mass index (BMI, kg/m 2 ) Body mass index was calculated as the participant's weight in kilograms (kg) divided by height in meters (m) squared.
Age (years) Participant's age was a continuous variable quantified in years.
Time since initial stroke (months) This variable was calculated as the time between initial stroke onset and the date of the baseline evaluation of the clinical trial. Participants were required to be at least 6 months post stroke to be eligible.

Number of strokes
This was quantified as the number of strokes the participant suffered and was confirmed by imaging.

Number of medications (including supplements)
This variable was quantified as the number of medications the participant reported taking, including supplements.

Area Deprivation Index (state decile)
The ADI is a composite index of neighborhood disadvantage that includes various indicators of education, housing quality and crowding, poverty, and employment. The ADI provides a state decile ranking from 1 to 10 for each individual state, where 1 indicates the least disadvantaged and 10 the most disadvantaged [41,42].
The ADI also provides a national percentile ranking from 1-100, with 1 representing the lowest level of disadvantage and 100 representing the highest level of disadvantage [41,42]. Our previous work identified a significant relationship between the ADI and steps/day in people with chronic stroke [31].
Usual orthotic device Usual orthotic device was a categorical variable coded as 0 = no orthotic device, 1 = orthotic device.
Usual assistive device Usual assistive device was a categorical variable coded as 0 = no assistive device, 1 = assistive device.
Living situation Living situation was a categorical variable coded as 0 = living alone, 1 = living with a family member/significant other, 2 = living alone but has outside assistance daily, 3 = other.
(Continued ) penalty in the first phase, as it would satisfy our requirements, however this penalty was not supported for Linear SVM at the time of this analysis. It is worth noting, though, that we were able to run the analysis using the elastic net penalty for only LR in the first phase and that the end results were the same. The second stage involved computing a measure of feature importance for the remaining features following dimensionality reduction (Fig 1) [52]. Throughout the analysis, the performance metrics were assessed using Monte Carlo Cross-Validation (MCCV) which in each instance consisted of 100 randomly generated train-test-splits of the data where for each split, 70% of the data was used as a training set and the remaining 30% was used as the test set.
For the first stage, we used logistic regression (LR) and linear support vector machine (SVM) both with lasso regularization with optimized regularization parameters. To optimize these parameters, a grid search was used with two refinements, each around the parameter value that was found to produce the best average model performance using MCCV. Once the optimized regularization parameters were found, the following procedure was executed for both regularized models: first, the model was fit over 100 random 70-30 train-test-splits of the data and the 100 sets of weights (i.e., coefficients, see S1 File for more detail) for each variable were recorded. Then, 1000 bootstrap samples of size 100 were generated for each feature and an empirical 95% confidence interval for the median of the coefficients was computed. The 1000 bootstrap samples for each feature were generated from the original sample of 100 coefficients computed from fitting the model over the 100 train-test splits and applying the function random.chosen() from the python module "random". Variables for which 0 was included in the 95% confidence interval for both regularized models were then dropped, and only the remaining variables would be used in the second stage.
In the second stage, a measure of feature importance was computed for three different machine learning models: LR, SVM with a radial basis function (RBF) kernel, and random forest (RF). Our criteria for choosing these three models were that they had to be commonly-used machine learning algorithms, mathematically distinct, and previously shown to perform well

Measure Description
Marital status Marital status was a categorical variable coded as 0 = married, 1 = not married.
Work status Work status was a categorical variable coded as 0 = employed full-time, 1 = employed part-time, 2 = retired, 3 = unemployed (includes being on disability).

Gender
Gender was coded as 0 = male and 1 = female.

Readiness to change activity behavior
Readiness to change activity behavior was measured on an ordinal scale based on the Transtheoretical Model of Change [43,44]. Participants endorsed a response that best reflected their current stage of change. This variable was categorized as 1 = currently not active and do not intend on becoming active in the next 6 months, 2 = currently not active but thinking about starting to become active in the next 6 months, 3 = currently active sometimes but not regularly, 4 = currently active regularly but have only begun doing so within the last 6 months, 5 = currently active regularly and have done so for longer than 6 months.
Relapse in activity behavior Relapse in activity behavior was measured via self-report and categorized as 1 = experienced a relapse in activity levels, 2 = no relapse in activity levels [43,44]. https://doi.org/10.1371/journal.pone.0270105.t002 with clinical and biological data (see S1 File for additional details) [53]. While all three of these algorithms fit this criterion, it is important to note that, unlike parametric models like LR and SVM whose models can be written down in function form, nonparametric models like RF are known for having strong predictive power, while lacking interpretability. In this way, RF could be thought of as having insight that is more complex yet could be difficult to quantify. For this stage, we aimed to use a measure of feature importance that is both easily interpretable as well as uniformly applicable across the three algorithms. We therefore chose the drop column feature importance process. The drop column importance measure uses a chosen metric of model performance to quantify how much a given variable contributes to a model's performance, i.e., whether the variable helps, hurts, or has no effect on the performance of the model [54]. Once the metric for performance is chosen, the drop column feature importance can be computed using any model. The idea of this procedure is to fit the model using all variables first (i.e., the benchmark model) and take a measure of the model's performance using the chosen metric (i.e., the benchmark performance). To measure the influence of an individual feature on this metric of model performance, that feature will then be dropped from the training set and the model is then refitted. A measure of this new model's performance, the dropped column performance is then taken, and that feature's overall importance is taken to be the difference between the model's performance with and without that feature, i.e.: importance ¼ benchmark performance À dropped column performance This same procedure is then followed for each feature. Thus, if a particular feature improves the model's performance, this importance measure will be positive because the dropped column performance would be lower than the benchmark performance (in other words, the model performed worse when we removed that particular feature) and vice versa for features whose importance is negative. The intuition underlying this procedure is that positive features hold pertinent information about the dependent variable, as they contributed the most to correctly identifying those in the target class.
The drop column procedure was run with MCCV for all three algorithms using only the features that were retained after regularization. For consistency, a single list of 100 train-testsplits was randomly generated and used for all three algorithms in this step. For each algorithm, every feature was given a drop column importance for every train test-split, resulting in 100 importance measures associated with each algorithm for each feature. From these samples of 100 importance scores for each feature, 1000 bootstrap samples of size 100 were taken and an empirical 95% confidence interval for the mean importance score was computed for each feature. A feature was considered important to an algorithm if the 95% confidence interval for the mean importance of that feature was positive. To address our first objective, we compared predictors whose mean importance was positive across all three algorithms with the framework that predictors that met this criterion are likely critically important for predicting the outcome.
Model performance. For our second objective, we compared the performance of these models by examining their prediction accuracy for each step threshold. For the aerobic threshold, the metric of prediction accuracy used was standard accuracy. Due to the nature and severity of the class imbalance in the home versus community case, particularly that the target class was the minority [55], the metric of balanced accuracy, which takes the average of the recall (also called the sensitivity or True Positive Rate) and specificity (also called the True Negative Rate), was used to more accurately reflect the models' performance on the target class (see S1 File). With this class imbalance, balanced accuracy is better suited than standard accuracy to assess model performance because taking the average of recall and specificity takes the accurate classification of both classes into account. This avoids the case where a constant model (i.e., labeling all points as the majority class) would result in a conflated standard accuracy score, while mis-identifying the entire target class. For the home versus community threshold, balanced accuracy was used to evaluate model performance for all three algorithms (RF, SVM, and LR). These models were fit using the algorithms' default parameters from the sklearn documentation for the aerobic threshold. Due to the class imbalance in the home versus community threshold, a grid search was performed to optimize only the class weights parameter, class_weight, to either have the default setting of no weights or to have balanced class weights, which imposes a weight on each class during fitting that is inversely proportional to the class frequency.
To assess model performance, as well as validate the regularization step in the feature importance procedure, we computed the appropriate measure of prediction accuracy, again using MCCV over a single, newly generated set of 100 train-test-splits. This was done first using all the features, then using the set of features retained in the regularization phase. The goal of regularization is to eliminate redundant or uninformative variables, resulting in the use of fewer variables without the loss of any critical information. Achieving this would be evidenced by either no significant decrease, or even an improvement in the model's performance when using the variables retained after regularization versus the full set of variables. When assessing the models' performance in the case of both thresholds, we would like to see them perform better than the "uninformed" model, which would randomly assign positive and negative class labels based on the class distribution. In the case of the aerobic threshold, since the class distribution was 69.03% positive class (<2500 steps/day) and 30.97% negative class (�2500 steps/day) this would mean that the uninformed model would have a baseline accuracy score of (0.6903) 2 +(0.3097) 2 = 0.5724, or 57.24%. In the case of the home vs. community threshold, since we use balanced accuracy as the metric of model performance, one of the benefits of balanced accuracy is that, regardless of the class distributions, the baseline balanced accuracy of the uninformed model would be 0.5 or 50%. In this work, accuracy measures that were in between performance of the "uninformed" model and 100% accuracy were considered "moderate". The data and code associated with this work are available on Open Science Framework at https://osf.io/tgzpb/ Table 3 displays the demographic characteristics and summary step data of our full sample (n = 268).

Results for home versus community threshold
After the regularization process, LR and linear SVM dropped all but the same 6 features (6MWT, PHQ-9, readiness to change relapse score, usual assistive device, years of education, and ADI_N) with linear SVM also keeping 1 additional feature (SSWS). This resulted in 7 of the original 25 variables proceeding to the drop column phase.
For both LR and SVM, all 7 variables resulting from the regularization step were found to be important, with each feature contributing at least 6% improvement to the balanced accuracy score on average for both algorithms. For RF, only 1 of the 7 variables was found to be important (6MWT). Thus, the only feature that was important to all three algorithms for the home versus community threshold was 6MWT, suggesting that walking endurance is critically important for predicting community mobility status in stroke. Fig 2 displays the results of the drop column phase for this threshold. Fig 3A displays the model performances for the home versus community threshold. All models demonstrated a moderate level of test-set accuracy after optimizing for class weights, where LR and SVM were fit with balanced class weights and RF was fit with none. Note that for the home versus community classification problem, we used balanced accuracy to measure model performance, meaning the scores represent how the model accurately identified individuals on average across the home and community classes. Overall, RF performed the worst, achieving an average balanced accuracy score of 68.1% (SD 6.5%, range 50.6-87.3%) with selected features and 67.9% (SD 5.3%, range 58.6% -82.7%) when using all features. SVM followed, achieving an average balanced accuracy score of 77.3% (SD 5.4%, range 62.7% -90.6%) with selected features and 73.1% (SD 5.7%, range 60.8% -84.1%) when using all features. Finally, LR achieved the best overall balanced accuracy with an average score of 78.6% (SD 4.8%, range 68.7% -90.1%) with selected features and 75.6% (SD 5.5%, range 63.3% -87.6%) when using all features. The similarities in model performance accuracies when comparing a model with all features to the simplified model following regularization across all algorithms demonstrate that the regularization phase was effective in reducing the number of features while maintaining model performance accuracy.

Results for aerobic threshold
After the regularization process, both linear SVM and LR produced the exact same results, with 16 features retained and carried forwards into the second stage of the analysis (see Y axis of Fig 4).
The drop column procedure was then run using these 16 variables over 100 random traintest splits of the data for all three algorithms. From these results, the only feature found to be important to all three algorithms was speed modulation, suggesting that the ability to change walking speed is critically important for predicting the aerobic step threshold in stroke. The full results of the drop column procedure for the aerobic threshold are displayed in

Discussion
We determined that 6MWT was the only variable found to be an important predictor across all three algorithms in distinguishing home versus community ambulators. We also found that speed modulation was the only variable deemed important across all algorithms in distinguishing between stroke survivors who meet physical activity guidelines versus those who do not.
Considering that 6MWT and speed modulation are measures of a stroke survivor's physical capacity and were the only variables found to be important across all three mathematically unique algorithms led us to conclude that measures of physical capacity are primary characteristics that distinguish between groups of stroke survivors using these binary step thresholds. Our finding that 6MWT was a primary characteristic for distinguishing between home versus community ambulators is in agreement with previous studies demonstrating that the 6MWT discriminates between functional walking categories in individuals with stroke [3,26]. This finding is also aligned with several studies reporting that measures of physical capacity are, in general, important predictors of functional walking categories [3,14,26,27] and physical activity [20] in individuals with stroke. A meta-analysis by Thilarajah et al [20] found that the 6MWT explains 37% of the variance in physical activity in individuals with stroke. Our results support this finding that the 6MWT is critically important for distinguishing between home and community ambulation in people with stroke. As past work has generally utilized a single analytical approach, our results extend this work by demonstrating that the 6MWT prevails as an important predictor across multiple settings and further suggests that walking endurance is important for predicting whether a stroke survivor will achieve community mobility status using a 2500-step threshold. These results are clinically important and suggest that clinicians should target the individual's walking endurance to achieve the goal of community ambulation. However, as can be observed in Fig 2, additional features were deemed important across some algorithms, but not all. The Area Deprivation Index, Patient Health Questionnaire-9, readiness to change relapse score, self-selected walking speed, assistive device use, and years of education were found to be important for both SVM and LR, but not RF. It is interesting to note that feature importance for RF was quite different from LR and SVM for this threshold. Unlike LR and SVM, RF is an ensemble method and was intentionally chosen to align with our objective of selecting models that were mathematically distinct. Therefore, this difference can be attributed, in part, to the mathematical differences among the models. Additionally, the fact that all seven of the features selected by the first phase were found to be important to LR and SVM suggest that the loss of any of these variables would come at a cost to model performance. For RF, however, it would appear that, as long as 6MWT is included, fewer variables than these seven could be used without sacrificing accuracy. Selecting models that were mathematically unique also enabled us to discern features that were most important across a range of settings. Taken together, these findings suggest that addressing walking endurance is likely necessary but not sufficient for achieving a 2500-step threshold and that ancillary features (defined as predictors that were important to one or two algorithms, but not all three), including depressive symptoms and readiness to change activity behavior, may need to be addressed to fully achieve this step threshold. For the aerobic threshold, speed modulation most consistently improved the prediction when using standard accuracy. Previous work has demonstrated that speed modulation is related to fall status [56] and daily walking activity [57] in older adults. Here we showed that speed modulation is also an important predictor of whether a stroke survivor will achieve a daily step threshold reflecting physical activity guidelines. Speed modulation was not evaluated in the work by Thilarajah et al [20]. This finding therefore provides new perspective on the importance of evaluating the ability to change walking speeds when attempting to improve daily stepping activity in people with stroke.
Since there is no uniform consensus on what magnitude of class imbalance warrants the use of standard versus balanced accuracy, we also assessed whether using balanced accuracy for the aerobic threshold would change our result. In this analysis, we found that both 6MWT and speed modulation were primary characteristics for achieving the aerobic step threshold (see S1 Fig). This similarity in result regardless of the use of standard accuracy (in which the 6MWT was close to being considered a primary characteristic) or balanced accuracy increases our confidence in our methodology and suggests that measures of physical capacity are critically important for achieving the aerobic step threshold.
However, our results demonstrate that targeting physical capacity is likely necessary but not sufficient for achieving a 5500-step threshold. This is supported by the fact that some algorithms, but not all, found that ancillary features were important in predicting the outcome. For example, both LR and SVM found that readiness to change activity behavior and the number of medications taken, were also important for predicting the aerobic step threshold, suggesting that readiness to change activity behavior and physical health status, may also need to be addressed to fully achieve this threshold.
We examined predictors of both 2500 and 5500-step thresholds as past work suggests that these thresholds likely provide different information. The 2500-step threshold is intended to distinguish between stroke survivors who walk primarily within their home setting from those who walk within their community [3]. As community ambulation typically involves greater walking distances [58], sufficient walking endurance is likely necessary to traverse community distances, lending credence to our finding that 6MWT is a primary characteristic of the home versus community threshold. In contrast, the 5500-step threshold is intended to distinguish between those who meet physical activity guidelines through walking and those who do not [10]. Physical activity guidelines are expressed in terms of exercise intensity and increasing one's walking speed is one approach to increasing exercise intensity [9]. Therefore, it logically follows that the ability to modulate walking speed is a primary characteristic of those who meet intensity-based physical activity guidelines through walking. However, when using balanced accuracy for the aerobic threshold, we found the 6MWT was also a primary characteristic and increasing one's exercise duration is another approach to meeting physical activity guidelines. We therefore conclude that measures of physical capacity are primary characteristics necessary to achieve these step thresholds and that associated ancillary characteristics may need to be addressed to fully achieve these step thresholds. This conclusion is largely in agreement with that of Thilarajah et al [20] who found that measures of physical capacity, which were identified as primary characteristics in the current work, and psychosocial factors, which were identified as ancillary characteristics in this work, are important for daily stepping activity in people with stroke. Our results extend upon this work by suggesting there may be a prioritization or hierarchy of these many predictors when attempting to improve daily walking activity in individuals with stroke.
Importantly, these results should not be interpreted as the 6MWT and speed modulation represent the best single predictors of home versus community or aerobic step thresholds (see S2 Fig for predictive accuracy of 6MWT and speed modulation alone in predicting thresholds).
For example, Fig 2 shows that for logistic regression (LR), the Area Deprivation Index (ADI_N) was more important for model performance than the 6MWT. This kind of inference would require determining that they hold the greatest predictive power over the other variables, which was not the purpose of this study. On the contrary, we defined a measure as important if it improved the model performance for all three algorithms (i.e., positive value for the drop column feature importance). Any measure that met this criterion was considered a primary characteristic essential for characterizing the step threshold. What our results therefore indicate is that 6MWT and speed modulation are primary characteristics that distinguish stroke survivors who do and do not meet these step thresholds. Thus, our results extend upon literature examining predictors of stepping activity in stroke by demonstrating that there are features that are primary characteristics that should be addressed as well as ancillary features that may need to be addressed based on the unique circumstances of the individual.
With respect to model performance, RF was outperformed by SVM and LR in the home versus community threshold, with LR marginally achieving the best results over SVM. In the case of the aerobic threshold, RF and LR outperformed SVM with average accuracy scores within less than 1% of each other both with and without selected features. It is important to note that we did minimal hyperparameter tuning in this assessment, only optimizing the parameter for class weights in the case of the home versus community threshold due to the class imbalance. After tuning just this single hyperparameter, LR and SVM saw improvements in average performance of 7.3-13.5% after tuning, where RF's performance did not improve with balanced class weights. It is possible that with more exhaustive hyperparameter tuning, model performance in the case of both thresholds could be improved.
Importantly, model performance accuracies were similar or better for a model with only features retained after regularization compared to a model with all features for all three algorithms and both thresholds. This validated the regularization phase, as we were able to reduce the number of variables to a relevant subset without compromising model performance. More directly, we were able to predict the home versus community and aerobic threshold classifications as well with 7 and 16 features, respectively, as we were when using all 25 features.

Limitations
First, the class imbalance in the case of the home versus community threshold may have limited the performance of our algorithms. Consequently, model performance metrics across samples could suffer from high variability. In addition, with only 58 home ambulators, those in our sample may not be representative of the stroke population, as individuals were excluded if they walked at a self-selected gait speed of <0.3 m/s. However, we aimed to address this limitation by using balanced accuracy as our metric of model performance for the home versus community threshold to avoid potential scenarios resulting in inflated measures of accuracy caused by the class imbalance (e.g., a constant model classifying all points as community ambulators). Second, although we used step thresholds endorsed by previous studies, we recognize that there are likely individuals who may not fit these exact criteria. For example, stroke survivors who achieve the step threshold to be considered community ambulators may take these steps primarily within their home environment.

Conclusions/implications
A stroke survivor's physical capacity to walk is a primary characteristic that can be used to determine whether they will achieve step thresholds corresponding to home versus community ambulation and physical activity guidelines. However, measures of physical capacity were not necessarily the single best predictors of achieving these thresholds. Thus, addressing physical capacity is necessary but not sufficient for achieving these thresholds and ancillary factors, such as readiness to change activity behavior and physical health status, among others, may need to be addressed based on the individual's unique clinical presentation. Future work on larger sample sizes that contain greater representation of home ambulators and other potentially relevant variables, such as fatigue and quality of life, is necessary.